This is a test. let me know what you think.

Setting workspace

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## [1] "ReadMe"       "Raw_Data"     "Clean_Data"   "Cleaning_Log" "Deletions"
## New names:
## * enrolled_boys_6_11...258 -> enrolled_boys_6_11...259
## * enrolled_boys_6_11...333 -> enrolled_boys_6_11...335

Checking your cleaning log

I am checking your cleaning log against your clean data.

## Joining, by = "binding"
dd$check_clean %>% table(useNA = "ifany")
## .
## FALSE  TRUE  <NA> 
##  4376 11885  9921

Out of your 25k entries, you have 12k correct, 4k not correct and 10k as NA.

NA’s

Let’s now just focus on the NA

dd %>% 
  filter(is.na(check_clean)) %>% 
  group_by(Question) %>%
  tally() %>%
  arrange(desc(n))

How-to read

dd %>% head()

Question: from your cleaning log
uuid: uuid
Old Value: from your cleaning log
New value: from your cleaning log
Reason: from your cleaning log
value_raw : value from raw dataset
value_clean: value from clean dataset
check_clean: New Value (from cleaning log) is the same as value_clean (from clean data)

ALL

logg %>% filter(Question == "All") %>% mutate(uuid_dup = duplicated(uuid)) 

there are 1681 survey deleted, there are 1676 in the deletion log.
there are 9 uuid duplicated in the cleaning log for deletion

action: none needed.

sanitation_features_other:

dd %>% filter(Question == "sanitation_features_other", is.na(check_clean)) 

it seems there are 10% of the household who does not have toilets.

action: to be check with the skip logic

main_source_water_other

dd %>% filter(Question == "main_source_water_other", is.na(check_clean)) 
dd %>% 
  filter(Question == "main_source_water_other", is.na(check_clean)) %>% 
  select(uuid, `New Value`) %>% 
  left_join(select(cleann, main_source_water, uuid)) 
## Joining, by = "uuid"

action: Explore what happened, was the “other” recoded?

when_arrived_current_location

dd %>% filter(Question == "when_arrived_current_location", is.na(check_clean)) %>% 
  View()

In your cleaning log, the “New Value” is empty, while there is a value in the clean dataset. No reason written
action: Explore what happened

when_leave_place_origin

dd %>% filter(Question == "when_leave_place_origin", is.na(check_clean)) %>% 
  View()

Same as above.

67"

dd %>% filter(Question == "67", is.na(check_clean))   

Not sure what 67 variable is for, but same as above. In addition, all of them were turn to 67 while the value were different
Action: investigate on your side.

sanitation_facilities_problems_other

dd %>% filter(Question == "sanitation_facilities_problems_other", is.na(check_clean)) %>%
  pull(Reason) %>% table()
## .
##                          Clarification from enumerator 
##                                                    339 
## Clarification from enumerator + Translating to English 
##                                                     16 
##                                  other options recoded 
##                                                      4 
##               other options recoded in the choice list 
##                                                     22 
##                                 Translating to English 
##                                                    272
dd %>% filter(Question == "sanitation_facilities_problems_other", is.na(check_clean), 
              Reason != "other options recoded", 
              Reason != "other options recoded in the choice list") 
dd %>% 
  filter(Question == "sanitation_facilities_problems_other", is.na(check_clean), 
         Reason != "other options recoded", 
         Reason != "other options recoded in the choice list") %>% 
  select(uuid, `New Value`) %>% 
  left_join(select(cleann, sanitation_facilities_problems, uuid)) %>% 
  select(`New Value`, sanitation_facilities_problems) %>% table()
## Joining, by = "uuid"
##                                                   sanitation_facilities_problems
## New Value                                          latrines_too_far other other
##   Big problems                                                          0     2
##   Dont have sanitation facilities                                       0     4
##   Dont have toilet                                                      0   216
##   Dont have toilet, need to dig and build a toilet                      0     2
##   Dont have toilet, share with a family                                 0     2
##   Go to open spaces                                                     0     2
##   Go to open spaces'                                                    0     2
##   Goto op[en spaces                                                     0     2
##   No , dont have toilet                                                 0    16
##   No sanitation facilities                                              0     4
##   None of the above                                                     0     2
##   Only have and open bit                                                0     2
##   open defication                                                       0     1
##   out toilet needs renovation                                           0     2
##   There are no toilets                                                  0     2
##   There is no sanitation facility                                       0    22
##   Use with neighbour                                                    0     2
##   We dont have toilet cleaning materials                                0     0
##   We go out at night                                                    0     0
##   we used open deficaiton                                               0    19
##   Yes                                                                   0     2
##                                                   sanitation_facilities_problems
## New Value                                          other facilities_too_crowded
##   Big problems                                                                0
##   Dont have sanitation facilities                                             0
##   Dont have toilet                                                            2
##   Dont have toilet, need to dig and build a toilet                            0
##   Dont have toilet, share with a family                                       0
##   Go to open spaces                                                           0
##   Go to open spaces'                                                          0
##   Goto op[en spaces                                                           0
##   No , dont have toilet                                                       0
##   No sanitation facilities                                                    0
##   None of the above                                                           0
##   Only have and open bit                                                      0
##   open defication                                                             0
##   out toilet needs renovation                                                 0
##   There are no toilets                                                        0
##   There is no sanitation facility                                             0
##   Use with neighbour                                                          0
##   We dont have toilet cleaning materials                                      0
##   We go out at night                                                          2
##   we used open deficaiton                                                     0
##   Yes                                                                         0
##                                                   sanitation_facilities_problems
## New Value                                          unclean_unhygienic other
##   Big problems                                                            0
##   Dont have sanitation facilities                                         0
##   Dont have toilet                                                        0
##   Dont have toilet, need to dig and build a toilet                        0
##   Dont have toilet, share with a family                                   0
##   Go to open spaces                                                       0
##   Go to open spaces'                                                      0
##   Goto op[en spaces                                                       0
##   No , dont have toilet                                                   0
##   No sanitation facilities                                                0
##   None of the above                                                       0
##   Only have and open bit                                                  0
##   open defication                                                         0
##   out toilet needs renovation                                             0
##   There are no toilets                                                    0
##   There is no sanitation facility                                         0
##   Use with neighbour                                                      0
##   We dont have toilet cleaning materials                                  2
##   We go out at night                                                      0
##   we used open deficaiton                                                 0
##   Yes                                                                     0

Same as for main source of water. Check if recoding happened or should happen. Action: explore what happened when recoding other.

shelter_issues_other

dd %>% 
  filter(Question == "shelter_issues_other", is.na(check_clean)) 
dd %>% 
  filter(Question == "shelter_issues_other", is.na(check_clean))%>% 
  select(uuid, `New Value`) %>% 
  left_join(select(cleann, shelter_issues, uuid)) %>% 
  select(`New Value`, shelter_issues) %>% table()
## Joining, by = "uuid"
##                      shelter_issues
## New Value             lack_bathing
##   Its nice                       0
##   Its normal                     0
##   My house is safe               0
##   no issue                       0
##   No problem                     0
##   No problems                    0
##   No, No problem                 4
##   None                           0
##   None of the above              0
##   Nothing                        0
##   Thanks to Allah                0
##   There is no issue              0
##   There is no problem            0
##                      shelter_issues
## New Value             lack_bathing lack_cooking lack_lights_inside lack_lights_outside
##   Its nice                                                                           0
##   Its normal                                                                         0
##   My house is safe                                                                   0
##   no issue                                                                           0
##   No problem                                                                         0
##   No problems                                                                        0
##   No, No problem                                                                     0
##   None                                                                               2
##   None of the above                                                                  0
##   Nothing                                                                            0
##   Thanks to Allah                                                                    0
##   There is no issue                                                                  0
##   There is no problem                                                                0
##                      shelter_issues
## New Value             lack_bathing lack_cooking lack_lights_outside
##   Its nice                                                        0
##   Its normal                                                      0
##   My house is safe                                                0
##   no issue                                                        0
##   No problem                                                      0
##   No problems                                                     0
##   No, No problem                                                  2
##   None                                                            0
##   None of the above                                               0
##   Nothing                                                         0
##   Thanks to Allah                                                 0
##   There is no issue                                               0
##   There is no problem                                             0
##                      shelter_issues
## New Value             lack_cooking lack_cooking other lack_lights_outside
##   Its nice                       0                  0                   0
##   Its normal                     0                  0                   0
##   My house is safe               0                  0                   0
##   no issue                       0                  0                   0
##   No problem                     0                  0                   0
##   No problems                    0                  0                   0
##   No, No problem                 0                  0                   2
##   None                           2                  0                   0
##   None of the above              0                  0                   0
##   Nothing                        0                  0                   0
##   Thanks to Allah                0                  0                   0
##   There is no issue              0                  0                   0
##   There is no problem            0                  0                   0
##                      shelter_issues
## New Value             lack_lights_outside lack_lights_inside
##   Its nice                                                 0
##   Its normal                                               0
##   My house is safe                                         0
##   no issue                                                 0
##   No problem                                               0
##   No problems                                              0
##   No, No problem                                           2
##   None                                                     0
##   None of the above                                        0
##   Nothing                                                  0
##   Thanks to Allah                                          0
##   There is no issue                                        0
##   There is no problem                                      0
##                      shelter_issues
## New Value             lack_lights_outside unsafe_bathing lack_privacy
##   Its nice                                             0            0
##   Its normal                                           0            0
##   My house is safe                                     0            0
##   no issue                                             0            0
##   No problem                                           0            0
##   No problems                                          0            0
##   No, No problem                                       2            0
##   None                                                 0            2
##   None of the above                                    0            0
##   Nothing                                              0            0
##   Thanks to Allah                                      0            0
##   There is no issue                                    0            0
##   There is no problem                                  0            0
##                      shelter_issues
## New Value             lack_space lack_privacy lack_lights_outside lack_lights_inside unsafe_cooking
##   Its nice                                                                                        0
##   Its normal                                                                                      0
##   My house is safe                                                                                0
##   no issue                                                                                        0
##   No problem                                                                                      0
##   No problems                                                                                     0
##   No, No problem                                                                                  2
##   None                                                                                            0
##   None of the above                                                                               0
##   Nothing                                                                                         0
##   Thanks to Allah                                                                                 0
##   There is no issue                                                                               0
##   There is no problem                                                                             0
##                      shelter_issues
## New Value             other other lack_lights_inside unsafe_bathing
##   Its nice                2                        0              0
##   Its normal              2                        0              0
##   My house is safe        2                        0              0
##   no issue                0                        0              0
##   No problem              8                        0              0
##   No problems             6                        0              0
##   No, No problem         98                        0              2
##   None                   52                        0              0
##   None of the above       2                        0              0
##   Nothing                 2                        0              0
##   Thanks to Allah         4                        0              0
##   There is no issue       2                        0              0
##   There is no problem    32                        0              0
##                      shelter_issues
## New Value             unsafe_bathing other
##   Its nice                               0
##   Its normal                             0
##   My house is safe                       0
##   no issue                               0
##   No problem                             0
##   No problems                            0
##   No, No problem                         2
##   None                                   0
##   None of the above                      0
##   Nothing                                0
##   Thanks to Allah                        0
##   There is no issue                      0
##   There is no problem                    0

It seems there is “no issues”. Should it be removed as you did then? Or removed? Action: investigate if re-coding correct.

hh_main_source_income_other

dd %>% 
  filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>%
  select(uuid, `Old Value`) %>% 
  left_join(select(cleann, shelter_issues, uuid))
## Joining, by = "uuid"
dd %>% 
  filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>% 
  pull(Reason) %>%
  table() 
## .
##                  Anwser was choice list              Clarification from choices 
##                                      98                                     177 
##           Clarification from enumerator          Clarification from translation 
##                                       1                                      48 
##                      irrelevent entries    other options recoded in the choices 
##                                      58                                      10 
##     other options recoded inthe choices                             reclasified 
##                                       8                                      11 
## response already in the list of choices 
##                                       3
dd %>% 
  filter(Question == "hh_main_source_income_other", is.na(check_clean)) %>% 
  select(uuid, `Old Value`) %>% 
  left_join(select(cleann, starts_with("hh_main_source_income"), uuid)) 
## Joining, by = "uuid"

does Clarification from choices and Clarification from translation means re-classfied?
Action: Check if clarification were correclty re-coded/classified.

common_type_ids_other

dd %>% 
  filter(Question == "common_type_ids_other", is.na(check_clean)) %>%
  select(uuid, `Old Value`, `New Value`) %>% 
  left_join(select(cleann, starts_with("common_type_ids"), uuid)) 
## Joining, by = "uuid"

Same as the the others “other”
Action: check recoding and action.

General comment

For all others that are removed because of “none”, maybe because of missing skip logic during the coding, do you need to also remove all the questions? or add a “none” in your select mutiple? Or do you want just to report as that and include the “no-issues” in the denominator? e.g. do you want to report : (1) XX % of households reported to have an issue. From the ones reporting an issue, YY% reported as OPTION1 as an issue. (2) Amongst all the households, XX% reported no issues, YY% reported OPTION1 as an issue. for (1) you would want to recode the none as missing for all options so that you can use the skiplogic.

wrong cleaning value.

dd %>% filter(check_clean == F)
dd %>% filter(check_clean == F) %>% nrow()
## [1] 4376

re-coding FALSE and TRUE to 0/1

logg2 <- logg %>% 
  mutate(`New Value` = ifelse(`New Value` == "FALSE", 0, `New Value`), 
         `New Value`= ifelse(`New Value` == "TRUE", 1, `New Value`))

old_new_values2 <- mapply(old_new, 
                          cleaning_log = split(logg2, row.names(logg2)), 
                          variable = "Question", 
                          MoreArgs = list(
                            data_raw = raww,
                            data_clean = cleann,
                            uuid_raw = "uuid",
                            uuid_clean = "uuid",
                            uuid_cleaning_log = "uuid"
                          ),
                          SIMPLIFY = F) %>% do.call(rbind, .)
dd2 <- logg2 %>% 
  mutate(binding = paste0(uuid, Question)) %>%
  left_join(old_new_values2) %>% 
  mutate(check_raw = `Old Value` == value_raw,
         check_clean = `New Value` == value_clean) %>%
  select(-c(binding, ID, `Follow-up`, Enumerator, Community, `Modified by?`, Notes, check_raw), 
         uuid, Question,`Old Value`, `New Value`, value_raw, value_clean, check_clean, Reason)
## Joining, by = "binding"
dd2 %>% filter(check_clean == F) %>% nrow()
## [1] 705

With changing the T/F we are reducing to 705 wrong values!

dd2 %>% filter(check_clean == F)

It seems "casuel _labour _Wages _construction _etc" has a typo.

dd2 %>% filter(check_clean == F, 
               `New Value` != "casuel _labour _Wages _construction _etc") %>% nrow()
## [1] 535

Down to 535.

dd2 %>% filter(check_clean == F, 
               `New Value` != "casuel _labour _Wages _construction _etc")

It seems some values are off.
Action: please check those.

cleaninginspectoR

inspec <- cleaninginspectoR::inspect_all(cleann, "uuid")

It seems you have lots of outliers.

inspec %>% 
  filter(issue_type %in% c("log normal distribution outlier", "normal distribution outlier")) %>%
  group_by(variable) %>%
  tally()

Action: Please check all those variables for LOW and HIGH values.

Extra check:

calculations

I don’t have the KOBO so I am just checking the obvious one

calculation total hh

cleann %>% 
  select(males_0_2y:total_hh) %>% 
  mutate(across(.fns = as.numeric)) %>% 
  mutate(total_cal = rowSums(across(males_0_2y:females_60_older)), 
         check = total_cal == total_hh) %>% 
  pull(check) %>% 
  table(useNA = "ifany")
## .
##  TRUE 
## 11645

All good
Action: none needed

age_pregnent_female_give_birth

cleann$age_pregnent_female_give_birth %>% table(useNA = "ifany")
## .
##    1   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26 
##    3   21    2    2    8    7   23   51   42   99   36   72   61   68  182   59 
##   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41   42 
##   78  120   85  210   53   95   49   33  172   28   40   50   20   43   12    8 
##   43   44   45   46   48   50   55 <NA> 
##    7    1   10    2    1    2    1 9789

Action: change the 1 year old pregnant woman to NA?

checks outliers LOW

I am not sure what does estimate_hh_income refers to but you have very low values all the estimate_hh_income.XXXX have very low values. People earning .20 dollars. See below for anything less than 5(USD?) (0’s excluded)

cleann %>% 
  select(uuid, starts_with("estimate_hh_income.")) %>% 
  mutate(across(starts_with("estimate_hh_income."), ~(. < 5 & . > 0), .names = "{.col}_less_5"))

Action: needed please checked those values.